efficient planning
Hybrid Search for Efficient Planning with Completeness Guarantees
Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.
Efficient Planning in Large MDPs with Weak Linear Function Approximation
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.
Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution
Stambaugh, Crimson, Rao, Rajesh P. N.
Training a policy with online rollouts can be costly, dangerous, and sample-inefficient [1]. Alternatively, offline reinforcement learning (RL) involves a policy trained exclusively with pre-collected data. Extracting effective polices without exploration or feedback from the environment is challenging for conventional off-policy and even specialized offline RL algorithms [2, 3]. Approaches to of-fline RL are also frequently faced with the problem of incomplete or undirected demonstrations [4, 5, 6]. Offline algorithms must compose sub-trajectories from training data to generate advantageous behaviors. Another challenge is high-dimensionality and long horizons, which make accurate planning and behavior cloning difficult [1]. Finally, sparse rewards pose a challenge to many training algorithms as they hinder accurate credit assignment to actions [7]. Diffusion models have emerged as a powerful framework for expressing complex, multi-modal distributions [8, 9]. Leveraging this model class, diffusion policies generate high fidelity actions and use a value function for action selection [10, 11, 12].
Hybrid Search for Efficient Planning with Completeness Guarantees
Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems.
Review for NeurIPS paper: Efficient Planning in Large MDPs with Weak Linear Function Approximation
Additional Feedback: After rebuttal: I read the author response and other reviews. It would be great to see more discussions in the next version. I think this paper has enough contribution and I will keep my original rating for acceptance. This paper is well written, and easy to follow. The main text is clean and the proof is deferred to the appendix.
Review for NeurIPS paper: Efficient Planning in Large MDPs with Weak Linear Function Approximation
All reviewers agree that the paper makes a nice contribution to planning with function approximation. In particular, the paper considers an important open problem, and while the problem is solved by making a few assumptions (mostly notably the core states), the results have made significant progress on the important problem. The reviewers also appreciate the use of precise language and careful description of related work. Among the remaining concerns, R2 wants to see some evidence of robustness against the failure of the "core state" assumption. While performing empirical experiments may not fit the theoretical nature of the paper, the authors can consider a theoretical justification: namely, define a notion of error that measures how much the core-states assumption is violated, and show how such an error manifest itself in the final guarantee.
Hybrid Search for Efficient Planning with Completeness Guarantees
Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems.
Efficient Planning in Large MDPs with Weak Linear Function Approximation
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.
Diffusion Models as Optimizers for Efficient Planning in Offline RL
Huang, Renming, Pei, Yunqiang, Wang, Guoqing, Zhang, Yangming, Yang, Yang, Wang, Peng, Shen, Hengtao
Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes they require. In this paper, we address this problem by decomposing the sampling process of diffusion models into two decoupled subprocesses: 1) generating a feasible trajectory, which is a time-consuming process, and 2) optimizing the trajectory. With this decomposition approach, we are able to partially separate efficiency and quality factors, enabling us to simultaneously gain efficiency advantages and ensure quality assurance. We propose the Trajectory Diffuser, which utilizes a faster autoregressive model to handle the generation of feasible trajectories while retaining the trajectory optimization process of diffusion models. This allows us to achieve more efficient planning without sacrificing capability. To evaluate the effectiveness and efficiency of the Trajectory Diffuser, we conduct experiments on the D4RL benchmarks. The results demonstrate that our method achieves $\it 3$-$\it 10 \times$ faster inference speed compared to previous sequence modeling methods, while also outperforming them in terms of overall performance. https://github.com/RenMing-Huang/TrajectoryDiffuser Keywords: Reinforcement Learning and Efficient Planning and Diffusion Model
- Asia > China (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments
Dedieu, Antoine, Lehrach, Wolfgang, Zhou, Guangyao, George, Dileep, Lázaro-Gredilla, Miguel
Despite their stellar performance on a wide range of tasks, including in-context tasks only revealed during inference, vanilla transformers and variants trained for next-token predictions (a) do not learn an explicit world model of their environment which can be flexibly queried and (b) cannot be used for planning or navigation. In this paper, we consider partially observed environments (POEs), where an agent receives perceptually aliased observations as it navigates, which makes path planning hard. We introduce a transformer with (multiple) discrete bottleneck(s), TDB, whose latent codes learn a compressed representation of the history of observations and actions. After training a TDB to predict the future observation(s) given the history, we extract interpretable cognitive maps of the environment from its active bottleneck(s) indices. These maps are then paired with an external solver to solve (constrained) path planning problems. First, we show that a TDB trained on POEs (a) retains the near perfect predictive performance of a vanilla transformer or an LSTM while (b) solving shortest path problems exponentially faster. Second, a TDB extracts interpretable representations from text datasets, while reaching higher in-context accuracy than vanilla sequence models. Finally, in new POEs, a TDB (a) reaches near-perfect in-context accuracy, (b) learns accurate in-context cognitive maps (c) solves in-context path planning problems.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)